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NEW DNA MOLECULES 
TECHNICAL FIELD 

5 The present invention provides novel nucleic acid molecules coding for 
sigma subunits of Mycobacterium tuberculosis RNA polymerase. It also 
relates to polypeptides, referred to as SigA and SigB, encoded by such 
nucleic acid molecules, as well as to vectors and host cells transformed 
with the said nucleic acid molecules. The invention further provides 
10 screening assays for compounds which inhibit the interaction between a 
sigma subunit and a core RNA polymerase. 



BACKGROUND ART 

15 

Transcription of genes to the corresponding RNA molecules is a complex 
process which is catalyzed by DNA dependent RNA polymerase, and 
involves many different protein factors. In eubacteria, the core RNA 
polymerase is composed of a, fj, and P' subunits in the ratio 2:1:1. To 
20 direct RNA polymerase to promoters of specific genes to be transcribed, 
bacteria produce a variety of proteins, known as sigma (a) factors, which 
interact with RNA polymerase to form an active holoenzyme. The resulting 
complexes are able to recognize and attach to selected nucleotide sequences 
in promoters. 

25 

Physical measurements have shown that the sigma subunit induces 
conformational transition upon binding to the core RNA polymerase. 
Binding of the sigma subunit to the core enzyme increases the binding 
constant of the core enzyme for DNA by several orders of magnitude 
30 (Chamberlin, MJ. (1974) Ann. Rev. Biochem. 43, 721-). 
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Characterisation of sigma subunits, identifi d and sequenced from various 
organisms, allows them to be classified into two broad categories; Group I 

70 

and Group n. The Group I sigma has also been referred to as the sigma' u 
class, or the ,f house keeping" sigma group. Sigma subunits belonging to 
5 this group recognise similar promoter sequences in the cell. These 

properties are reflected in certain regions of the proteins which are highly 
conserved between species. 

Bacterial sigma factors do not have any homology with eukaryotic 
10 transcription factors, and are consequently a potential target for 

antibacterial compounds. Mutations in the sigma subunit, effecting its 
association and ability to confer DNA sequence specificity to the enzyme, 
are known to be lethal to the cell. 



15 Mycobacterium tuberculosis is a major pulmonary pathogen which is 

characterized by its very slow growth rate. As a pathogen it gains access to 
alveolar macrophages where it multiplies within the phagosome, finally 
lysing the cells and being disseminated through the blood stream, not only 
to other areas of the lung, but also to extrapulmonary tissues. Thus the 

20 pathogen multiplies in at least two entirely different environments, which 
would involve the utilisation of different nutrients and a variety of possible 
host factors; a successful infection would thus involve the coordinated 
expression of new sets of genes. This regulation would resemble different 
physiological stages, as best exemplified by Bacillus, in which the 

25 expression of genes specific for different stages are transcribed by RNA 
polymerases associating with different sigma factors. This provides the 
possibility of targeting not only the house keeping sigma of M. tuberculosis, 
but also sigma subunits specific for the different stages of infection and 
dissemination. 



WO 96/38478 

-3- 

BRIEF DESCRIPTION OF THE DRAWINGS 
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Fig. 1: Map of plasmid pARC 8175 
Fig. 2: Map of plasmid pARC 8176 

5 

PURPOSE OF THE INVENTION 

Since the association to a specific sigma subunit is essential for the 
10 specificity of RNA polymerase, this process of association is a suitable 
target for drug design. In order to identify compounds capable of 
inhibiting the said association process, the identification of the primary 
structures of sigma subunits is desirable. 

15 It is thus the purpose of the invention to provide information on sequences 
and structure of sigma subunits, which information will enable the 
screening, identification and design of compounds competing with the 
sigma subunit for binding to the core RNA polymerase, which compounds 
may be developed into effective therapeutic agents. 

20 2: 

DISCLOSURE OF THE INVENTION 

Throughout this description and in particular in the following examples, 
25 the terms "standard protocols" and "standard procedures", when used in 
the context of molecular cloning techniques, are to be understood as 
protocols and procedures found in an ordinary laboratory manual such as: 
Sambrook, J., Fritsch, E.F. and Maniatis, T. (1989) Molecular Qoning: A 
laboratory manual, 2nd Ed., Cold Spring Harbor Laboratory Press, Cold 
30 Spring Harbor, NY. 
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In a first aspect, this invention provides an isolated polypeptide which is a 
Group I sigma sub unit of Mycobacterium tuberculosis RNA polymerase, or a 
functionally equivalent modified form thereof. 

5 Preferred such polypeptides having amino acid sequences according to 
SEQ ID NO. 2 or 4 of the Sequence Listing have been obtained by 
recombinant DNA techniques and are hereinafter referred to as SigA and 
SigB polypeptides. However, it will be understood that the polypeptides 
according to the invention are not limited strictly to polypeptides with an 

10 amino acid sequence identical with SEQ iD NO: 2 or 4 in the Sequence 

Listing. Rather the invention additionally encompasses modified forms of 
these native polypeptides carrying modifications like substitutions, small 
deletions, insertions or inversions, which polypeptides nevertheless have 
substantially the biological activities of a M. tuberculosis sigma subunit 

15 Such biological activities comprise the ability to associate with the core 
enzyme and / or confer the property of promoter sequence recognition 
and initiation of transcription. Included in the invention are consequendy 
polypeptides, the amino acid sequence of which are at least 90% 
homologous, preferably at least 95% homologous, with the amino acid 

20 sequence shown as SEQ ID NO: 2 or 4 in the Sequence Listing. 

In another aspect, the invention provides isolated and purified nucleic acid 
molecules which have a nucleotide sequence coding for a polypeptide of 
the invention e.g. the SigA or SigB polypeptide. In a preferred form of the 

25 invention, the said nucleic acid molecules are DNA molecules which have 
a nucleotide sequence identical with SEQ ID NO: 1 or 3 of the Sequence 
Listing. However, the nucleic acid molecules according to the invention are 
not to be limited strictly to the DNA molecules with the sequence shown 
as SEQ ID NO: 1 or 3. Rather the invention encompasses nucleic acid 

30 molecules carrying modifications like substitutions, small deletions, 
insertions or inversions, which nevertheless encode proteins having 
substantially the biochemical activity of the polypeptides according to the 
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invention. Included in the invention are consequently DNA molecules, the 
nucleotide sequences of which are at least 90% homologous, preferably at 
least 95% homologous, with the nucleotide sequence shown as SEQ ID NO: 
1 or 3 in the Sequence Listing. 

Included in the invention are also DNA molecule which nucleotide 
sequences are degenerate, because of the genetic code, to the nucleotide 
sequences shown as SEQ ID NO: 1 or 3. A sequential grouping of three 
nucleotides, a "codon", codes for one amino acid. Since there are 64 
possible codons, but only 20 natural amino acids, most amino acids are 
coded for by more than one codon. This natural "degeneracy*, or 
"redundancy", of the genetic code is well known in the art It will thus be 
appreciated that the DNA sequence shown in the Sequence Listing is only 
an example within a large but definite group of DNA sequences which will 
encode the polypeptide as described above. 

Included in the invention are consequently isolated nucleic acid molecule 
selected from: 

(a) DNA molecules comprising a nucleotide sequence as shown in SEQ ID 
NO: 1 or SEQ ID NO: 3 encoding a Group I sigma subunit of 
Mycobacterium tuberculosis RNA polymerase; 

(b) nucleic add molecules comprising a nucleotide sequence capable of 
hybridizing to a nucleotide sequence complementary the polypeptide 
coding region of a DNA molecule as defined in (a) and which codes for a 
polypeptide which is a Group I sigma subunit of Mycobacterium tuberculosis 
or a functionally equivalent modified form thereof; and 

(c) nucleic acid molecules comprising a nucleic acid sequence which is 
degenerate, as a result of the genetic code, to a nucleotide sequence as 
defined in (a) or (b) and which codes for a polypeptide which is a Group I 
sigma subunit of Mycobacterium tuberculosis or a functionally equivalent 
modified form thereof. 
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The term "hybridizing to a nucleotide s quence n should be understood as 
hybridizing to a nucleotide sequence, or a specific part thereof, under 
stringent hybridization conditions which are known to a person skilled in 
the art. 

5 

A DNA molecule of the invention may be in the form of a vector, e.g. a 
replicable expression vector which carries and is capable of mediating the 
expression of a DNA molecule according to the invention. In the present 
context the term "replicable" means that the vector is able to replicate in a 

10 given type of host cell into which is has been introduced. Examples of 
vectors are viruses such as bacteriophages, cosmids, plasmids and other 
recombination vectors. Nucleic acid molecules are inserted into vector 
genomes by methods well known in the art. Vectors according to the 
invention can include the plasmid vector pARC 8175 (NCTMB 40738) which 

15 contains the coding sequence of the sigA gene, or pARC 8176 (NCIMB 
40739) which contains the coding sequence of the sigB gene. 

Included in the invention is also a host cell harbouring a vector according 
to the invention. Such a host cell can be a prokaryotic cell, a unicellular 

20 eukaryotic cell or a cell derived from a multicellular organism. The host 
cell can thus e.g. be a bacterial cell such as an E. coli cell; a cell from a 
yeast such as Saccharomyces cervisiae or Pichia pastor is, or a mammalian cell. 
The methods employed to effect introduction of the vector into the host 
cell are standard methods well known to a person familiar with 

25 recombinant DNA methods. 

A further aspect of the invention is a process for production of a 
polypeptide of the invention, comprising culturing host cells transformed 
with an expression vector according of the invention under conditions 
30 whereby said polypeptide is produced, and recovering said polypeptide. 
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The medium used to grow the cells may be any conventional medium 
• suitable for the purpose. A suitable vector may be any of the vectors 
described above, and an appropriate host cell may be any of the cell types 
listed above. The methods employed to construct the vector and effect 
introduction thereof into the host cell may be any methods known for such 
purposes within the field of recombinant DNA. The recombinant 
polypeptide expressed by the cells may be secreted, i.e. exported through 
the cell membrane, dependent on the type of cell and the composition of 
the vector. 

If the polypeptide is produced intracellularly by the recombinant host, i.e. 
is not secreted by the cell, it may be recovered by standard procedures 
comprising cell disrupture by mechanical means, e.g. sonication or 
homogenization, or by enzymatic or chemical means followed by 
purification. 

In order to be secreted, the DNA sequence encoding the polypeptide 
should be preceded by a sequence coding for a signal peptide, the presence 
of which ensures secretion of the polypeptide from the cells so that at least 
a significant proportion of the polypeptide expressed is secreted into the 
culture medium and recovered. 

Another important aspect of the invention is a method of assaying for 
compounds which have the ability to inhibit the association of a sigma 
subunit to a Mycobacterium tuberculosis RNA polymerase, said method 
comprising the use of a recombinant SigA or SigB polypeptide or a nucleic 
acid molecule as defined above. Such a method will preferably comprise (i) 
contacting a compound to be tested for such inhibition ability with a SigA 
or SigB polypeptide as described above and a Mycobacterium tuberculosis 
core RNA polymerase; and (ii) detecting whether the said polypeptide 
associates with the said core RNA polymerase to form RNA polymerase 
holoenzyme. The term "cor RNA polymerase" is to be understood as an 
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RNA polymerase which comprises at least the a, P, and p' subunits, but 
not the sigma subunit. The term "RNA polymerase holoenzyme" is to be 
understood as an RNA polymerase comprising at least the a, p, p' and 
sigma subunits. If desirable, the sigma subunit polypeptide can be labelled, 
5 for example with a suitable radioactive molecule, e.g. or 

Suitable methods for determining whether a sigma polypeptide has 
associated to core RNA polymerase are disclosed by Lesley et al. 
(Biochemistry 28, 7728-7734, 1989). Such a method may thus be based on 

10 the size difference between sigma polypeptides bound to core RNA 

polymerase, versus polypeptides not bound. This difference in size allows 
the two forms to be separated by chromatography, e.g. on a gel nitration 
column, such as a Waters Protein Pak® 300SW sizing column. The two 
forms eluted from the column may be detected and quantified by known 

15 methods, such as scintillation counting or SDS-PAGE followed by 
immunoblottmg. 

According to another method also described by Lesley et al. (supra), RNA 
- polymerase holoenzyme is detected by immunoprecipitation using an 

20 antibody binding to RNA polymerase holoenzyme. Core RNA polymerase 
from an organism such as £. coli, M. tuberculosis or M. smegmatis can be 
allowed to react with a radiolabeled SigA or SigB polypeptide. The 
reaction mix is treated with Staphylococcus aureus formalin-treated cell 
suspension, pre treated with an anti-RNA polymerase antibody. The cell 

25 suspension is washed to remove unbound proteins, resuspended in SDS- 
PAGE sample buffer and separated on SDS-PAGE. Bound SigA or SigB 
polypeptides are monitored by autoradiography followed by scintillation 
counting. 

30 Another method of assaying for compounds which have the ability to 
inhibit sigma subunit-dependent transcription by a Mycobacterium 
tuberculosis RNA polymerase can comprise (i) contacting a compound to be 
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tested for said inhibition ability with a polypeptide of the invention, a 
■ Mycobacterium tuberculosis core RNA polymerase, and a DNA having a 
coding sequence operably-linked to a promoter sequence capable of 
recognition by said core RNA polymerase when bound to said 
polypeptide, said contacting being carried out under conditions suitable for 
transcription of said coding sequence when Mycobacterium tuberculosis RNA 
polymerase is bound to said promoter; and (ii) detecting formation of 
mRNA corresponding to said coding sequence. 

Such an assay is based on the fact that E. coli consensus promoter 
sequences are not transcribable by core RNA polymerase lacking the sigma 
subunit. However, addition of a sigma 70 protein will enable the complex 
to recognise specific promoters and initiate transcription. Screening of 
compounds which have the ability to inhibit sigma-dependent transcription 
can thus be performed, using DNA containing a suitable promoter as a 
template, by monitoring the formation of mRNA of specific lengths. 
Transcription can be monitored by measuring incorporation of ^H-UTP 
into TCA-precipitable counts (Ashok Kumar et al. (1994) J. Mol. Biol. 235, 
405-413; Kajitani, M. and Ishihama, A. (1983) Nucleic Adds Res. 11, 671-686 
and 3873-3888) and detennining the length of the specific transcript 
Compounds which are identified by such an assay can inhibit transcription 
by various mechanisms, such as (a) binding to a sigma protein and 
preventing its association with the core RNA polymerase; (b) binding to 
core RNA polymerase and sterically inhibiting the binding of a sigma 
protein; or (c) inhibiting intermediate steps involved in the initiation or 
elongation during transcription. 

A further aspect of the invention is a method of detenruning the protein 
structure of a Mycobacterium tuberculosis RNA polymerase sigma subunit, 
characterised in that a SigA or SigB polypeptide is utilized in X-ray 
crystallography. The use of SigA or SigB polypeptide in crystallisation will 
facilitate a rati nal design, based on X-ray crystallography, of therapeutic 
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compounds inhibiting interacti n of a sigma 70 protein with the core RNA 
polymerase, alternatively inhibiting the binding of a sigma70 protein, in 
association with a core RNA polymerase, to DNA during the course of 
gene transcription. 



EXAMPLES 

EXAMPLE 1: Identification of M. tuberculosis DNA sequences homologous 
10 to the sigma 70 gene 

1.1. PCR amplification of putative sigma™ homologues 

The following PCR primers were designed, based on the conserved amino 
15 acid sequences of sigma 4 ^ (a sigma 70 homologue) of Bacillus subtUis and 
sigma 70 of E. coli (Gitt, M.A. et al. (1985) J. Biol. Chem. 260, 7178-7185): 

Forward primer (SEQ ID NO: 5): 

5 ' -AAG TTC AGC 'ACG TAC GCC ACG TGG TGG ATC-3 ' 
20 C G C 

Reverse primer (SEQ ID NO: 6): 

5'-CTT GGC CTC GAT CTG GCG GAT GCG CTC-3 . 
C C C 

25 

The alternative nucleotides indicated at certain positions indicate that the 
primers are degenerate primers suitable for amplification of the 
unidentified gene. 

30 Chromosomal DNA from M. tuberculosis H37RV (ATCC 27294) was 
prepared following standard protocols. PCR amplification of a DNA 
fragment of approximately 500 bp was carried out using the following 
conditions: 
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Annealing: +55°C 1 min 

Denaturation: +93°C 1 min 

Extension: +73°C 2 min 

5 12. Southern hybridisation of M. tuberculosis DNA 

Chromosomal DNA from M. tuberculosis H37RV (ATCC 27294), 
M. tuberculosis H37RA and Mycobacterium smegma tis was prepared 
following standard protocols and restricted with the restriction enzyme 

10 Sail. The DNA fragments were resolved on a 1% agarose gel by 

electrophoresis and transferred onto nylon membranes which were 
subjected to "Southern blotting" analysis following standard procedures. To 
detect homologous fragments, the membranes were probed with a 
radioactively labelled -500 bp DNA fragment, generated by PCR as 

15 described above. 

Analysis of the Southern hybridisation experiment revealed the presence of 
at least three hybridising fragments of approximately 4.2, 22 and 0.9 kb, 
respectively, in the SaZI-digested DNA of both of the M. tuberculosis strains. 
20 In M. smegmatis, two hybridising fragments of 4.2 and 22 kb, respectively, 
were detected. It could be concluded that there were multiple DNA 
fragments with homology to the known sigma^ genes. 

Sirrdlar Southern hybridisation experiments, performed with four different 
25 clinical isolates of M. tuberculosis, revealed identical patterns, indicating the 
presence of similar genes also in other virulent isolates of M. tuberculosis. 

EXAMPLE 2: Cloning of putative sigma^ homologues 

30 

2.1. Cloning of M. tuberculosis sigA 
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A lambda gtll library (obtained from WHO) of the chromosomal DNA of 
M. tuberculosis Erdman strain was screened, using the 500 bp PCR probe as 
described above, following standard procedures. One lambda gtll phage 
with a 4.7 kb EcoRL insert was identified and confirmed to hybridise with 
5 the PCR probe. Restriction analysis of this 4.7 kb insert revealed it to have 
an internal 2.2 kb Sail fragment which hybridised with the PCR probe. 

The 4.7 kb fragment was excised from Ihe lambda gt 11 DNA by EcoRI 
restriction, and subcloned into the cloning vector pBR322, to obtain the 
10 recombinant plasmid pARC 8175 (Fig. 1) (NC3MB 40738). 

The putative sigma 7 " homologue on the 2J2 kb Soil fragment was 
designated M. tuberculosis sigA. The coding sequence of the sigA gene was 
found to have an interned Sali site, which could explain the hybridisation 
15 of the 0.9 kb fragment in the Southern experiments. 

2J2. Cloning of M. tuberculosis sigB 

M. tuberculosis H37Rv DNA was restricted with Sail and the DNA 
20 fragments were resolved by preparative agarose gel electrophoresis. The 
agarose gel piece corresponding to the 4.0 to 5.0 kb size region was cut 
out, and the DNA from this gel piece was extracted following standard 
protocols. This DNA was ligated to the cloning vector pBR329 at its Soil 
site, and the ligated DNA was transformed into £. coli DH5a to obtain a 
25 sub-library. Transformants of this sub-library were identified by colony 
blotting, using the PCR-derived 500 bp probe, following standard 
protocols. Individual transformant colonies were analyzed for their 
plasmid profile. One of the recombinant plasmids retaining the expected 
plasmid size, was analyzed in detail by restriction mapping and was found 
30 to harbour the expected 4.2 kb Sail DNA fragment This plasmid with the 
sigB gene on the 4.2 kb insert was designated pARC 8176 (Fig. 2) (NCIMB 
40739). 
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EXAMPLE 3: Nucleotide sequence of M. tuberculosis sigA and sigB genes 



3.2. Nucleotide sequence of sigA 

The EcoRV - EcbRL DNA fragment expected to encompass the entire sigA 
gene was subcloned into appropriate M13 vectors and both strands of the 
gene sequenced by the dideoxy method. The sequence obtained is shown 
as SEQ ID NO: 1 in the Sequence Listing. An open reading frame (ORF) of 
1580 nucleotides (positions 70 to 1650 in SEQ ID NO: 1) coding for a 
protein of 526 amino acids was predicted from the DNA sequence. The N- 
terminal amino acid has been assigned tentatively based on the first GTG 
(initiation codon) of the ORF. 

The derived amino acid sequence of the gene product SigA (SEQ ID NO: 
2) showed 60% identity with the E. coli sigma 70 and 70% identity with the 
HrdB sequence of Streptomyces coelicolor. The overall anatomy of the SigA 
sequence is compatible with that seen among sigma 70 proteins of various 
organisms. This anatomy comprises a highly conserved C-terminal half, 
while the N- terminal half generally shows lesser homology. The two 
regions are linked by a stretch of amino acids which varies in length and is 
found to be generally unique for the protein. The SigA sequence has a 
similar structure, where the unconserved central stretch correspond to 
amino acids 270 to 306 in SEQ ID NO: 2. 

The N-terminal half has limited homology to E. coli sigma 70 , but shows 
resemblance to that of the sigma 70 homologue HrdB of S. coelicolor. The 
highly conserved motifs of regions 3.1, 3.2, 4.1 and 4.2 of S. coelicolor which 
were proposed to be involved in DNA binding (Lonetto, M. et al. (1992) 
J. BacterioL 174, 3843-3849) are found to be nearly identical also in the 
M. tuberculosis SigA sequence. The N-tenrunal start of the protein has been 
tentatively assigned, based on homologous motifs of the S. coelicolor HrdB 
sequence. 
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The overall sequence similarity of the SigA and SigB amino acid sequences 

to known sigma 7 ^ sequences suggests assignment of the M. tuberculosis 

70 

SigA to the Group I signur" proteins. However, SigA also shows distinct 

70 

differences with known sigma /u proteins, in particular a unique and 
5 lengthy N- terminal stretch of amino acids (positions 24 to 263 in SEQ ID 
NO. 2), which may be essential for the recognition and initiation of 
transcription from promoter sequences of M. tuberculosis. 



3.2. Nucleotide sequence of sigB 

10 

The nucleotide sequence of the sigB gene (SEQ ID NO: 3) encodes a protein 

of 323 amino acids (SEQ ID NO: 4). The N-terminal start of the protein has 

been tentatively identified based on the presence of the first methionine of 

the ORE The ORF is thus estimated to start at position 325 and to end at 

15 1293 in SEQ ID NO: 3. Alignment of the amino add sequence of the sigB 

7n 

gene with other sigma'" proteins places the sigB gene into the Group I 
family of sigma^ proteins. The overall structure of the gene product SigB 
follows the same pattern as described for SigA. However, the SigB 
sequence has only 60% homology with the SigA sequence, as there are 
20 considerable differences not only within the unconserved regions of the 
protein, but also within the putative DNA binding regions of the sigB 
protein. These characteristics suggest that the SigB protein may play a 
distinct function in the physiology of the organism. 



25 



EXAMPLE 4: Expression of sigA and sigB 

4.1. Expression of M. tuberculosis sigA gene in E. coli 



30 



The N-texminal portion of the sigA gene was amplified by PCR using the 
following primers: 
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Forward primer (SEQ ID NO: 7), comprising an Ncolsite: 

66nt 80nt 

I I . 

5'-TT CC ATG GGG TAT GTG GCA GCG ACC-3 ' 

5 MGYVAAT 
Reverse primer (SEQ ID NO: 8): 

5'-GTA CAG GCC AGC CTC GAT CCG CTT GGC-3 ' 

(a) A fragment of approximately 750 bp was amplified from the sigA gene 
construct pARC 8175. The amplified product was restricted with Ncol and 
BamHI to obtain a 163 bp fragment. 

(b) A 1400 bp DNA fragment was obtained by digestion of pARC 8175 
with BamHI and EcoRV. 

(c) The expression plasmid pET 8ck, which is a derivative of pET 8c 
(Studier, F.W. et al. (1990) Methods Enzymol. 185, 61-89) in which the p- 
lactamase gene has been replaced by the gene conferring kanamycin 
resistance, was digested with Ncol and EcoRV and a fragment of v.- itk 
approximately 4.2 kb was purified. 

These three fragments (a), (b) and (c) were ligated by standard methods 
and the product was transformed into E. coli DHScc Individual 
transformants were screened for the plasmid profile following standard 
protocols. The transformant was identified based on the expected plasmid 
size (approximately 6.35 kb) and restriction mapping of the plasmid. The 
recombinant plasmid harbouring the coding fragment of sigA was 
designated pARC 8171. 

The plasmid pARC 8171 was transformed into the T 7 expression host 
E. coli BL2KDE3). Individual transformants were screened for the presence 
of the 6.35 kb plasmid and confirmed by restriction analysis. One of the 
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trans formants was grown at 37°C and induced with 1 mM isopropyl-p-D- 
thiogalactopyranoside (IPTG) using standard protocols. A specific 90 kDa 
protein was induced on expression. Cells were harvested by low speed 
centrifugation and lysed by sonication in phosphate buffered saline, pH 
5 7.4. The lysate was centrifugated at 100/000 x g to fractionate into 

supernatant and pellet. The majority of the 70 kDa product obtained after 
induction with IPTG was present in the pellet fraction, indicating that the 
protein formed inclusion bodies. 

1C For purifying the induced sigA gene product, the cell lysate as obtained 
above was clarified by centrifugation at 1000 rpm in Beckman JA 21 rotor 
for 15 min. The clarified supernatant was layered on a 15-60% sucrose 
gradient and centrifugated at 100,000 x g for 60 min. The inclusion bodies 
sedixnented as a pellet through the 60% sucrose cushion. This pellet was 

15 solubilised in 6 M guard dine hydrochloride which was removed by 

sequential dialysis against buffer containing decreasing concentration of 
guanidine hydrochloride. The dialysate was 75% enriched for the SigA 
protein which was purified essentially following the protocol for 
purification E. coli sigma 70 as described by Brokhov, S. and Goldfarb, A. 

20 (1993) Protein expression and purification, vol. 4, 503-511. 

42. Expression ofM. tuberculosis sigB gene in £. coli 

The sigB gene product was expressed and purified from inclusion bodies. 
25 The coding sequence of the sigB gene was amplified by PCR using the 
following primers: 

Forward primer (SEQ ID NO: 9), comprising an Ncol restriction site: 

5'- TTTC ATG GCC GAT GCA CCC ACA AGG GCC-3 ' 
30 MADAPTRA 

Reverse primer (SEQ ID NO: 10), comprising an EcoBl restriction site: 
5' - CTT GAA TTC AGC TGG CGT ACG ACC GCA- 3 ' 
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The amplified 920 bp fragment was digested with EedRL and Ncol and 
ligated to the EcoRI- and Ncol-digested pRSET B (Kroll €t al. (1993) DNA 
and Cell Biology 12, 441). The ligation mix was transformed into £. coli 
DH5cl Individual transformants were screened for plasmid profUe and 
restriction analysis. The recombinant plasmid having the expected plasmid 
profile was designated pARC 8193. 

£. coli DH5<x harbouring pARC 8193 was cultured in LB containing in 50 
ug/ml ampicillin till an OD of 0.5, and induced with 1 mM IPTG at 37°C, 
following standard protocols. The induced SigB protein was obtained as 
inclusion bodies which were denatured and renatured following the same 
protocol as described for the SigA protein. The purified SigB protein was 
>90% homogenous and suitable for transcription assays. 

DEPOSIT OF MICROORGANISMS 

The following plasmids have been deposited under the Budapest Treaty at 
the National Collections of Industrial and Marine Bacteria (NCIMB), 
Aberdeen, Scotland, UK 



Plasmid 
pARC 8175 
pARC 8176 



Accession No. 
NCIMB 40738 
NCIMB 40739 



Date of deposit 
15 June 1995 
15 June 1995 
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SEQUENCE LISTING 



(1) GENERAL INFORMATION: 

(i) APPLICANT: 

(A) NAME: Astra AB 

(B) STREET: Vastra Malarehamnen 9 
<C) CITY : Sodertalje 

(E) COUNTRY : Sweden 

(F) POSTAL CODE (ZIP) : S-151 8S 

(G) TELEPHONE: +46-8-553 260 00 

(H) TELEFAX: +46-8-553 288 20 

(I) TELEX: 19237 astra s 

(ii) TITLE OF INVENTION: New DNA Molecules 
(iii) NUMBER OF SEQUENCES: 10 

(iv) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC -DOS /MS-DOS 

(D) SOFTWARE: Patentln Release 51.0, Version #1.30 (EPO) 

(2) INFORMATION FOR SEQ ID NO: 1: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1724 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : both 

(D) TOPOLOGY: linear 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Mycobacterium tuberculosis 

(B) STRAIN: Exdxnan strain 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: pARC 8175 

(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION:70. -1653 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 1: 

AACTAGCAGA CACTTTCGGT TACGCACGCC CAGACCCAAC CGGAAGTGAG TAACGACCGA 60 

AGGGTGTAT GTG GCA GCG ACC AAA GCA AGC ACG GCG ACC GAT GAG CCG 108 
Val Ala Ala Thr Lys Ala Ser Thr Ala Thr Asp Glu Pro 
1 5 10 

GTA AAA CGC ACC GCC ACC AAG TCG CCC GCG GCT TCC GCG TCC GGG GCC 156 
Val Lys Arg Thr Ala Thr Lys Ser Pro Ala Ala Ser Ala Ser Gly Ala 
15 20 25 

AAG ACC GGC GCC AAG CGA ACA GCG GCG AAG TCC GCT ACT GGC TCC CCA 204 
Lys Thr Gly Ala Lys Arg Thr Ala Ala Lys Ser Ala Ser Gly Ser Pro 
30 35 40 45 

CCC GCG AAG CGG GCT ACC AAG CCC GCG GCC CGG TCC GTC AAG CCC GCC 252 
Pro Ala Lys Arg Ala Thr Lys Pro Ala Ala Arg Ser Val Lys Pro Ala 

50 55 60 
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TCG GCA CCC CAG GAC ACT ACG ACC AGC ACC ATC CCG AAA AGG AAG ACC 300 
Ser Ala Pro Gin Asp Thr Thr Thr Ser Thr lie Pro Lys Arg Lys Thr 
65 70 75 

CGC GCC GCG GCC AAA TCC GCC GCC GCG AAG GCA CCG TCG GCC CGC GGC 348 
Arg Ala Ala Ala Lys Ser Ala Ala Ala Lys Ala Pro Ser Ala Arg Glv 
80 85 -90 

CAC GCG ACC AAG CCA CGG GCG CCC AAG GAT GCC CAG CAC GAA GCC GCA 396 
His Ala Thr Lys Pro Arg Ala Pro Lys Asp Ala Gin His Glu Ala Ala 
95 100 105 

ACG GAT CCC GAG GAC GCC CTG GAC TCC GTC GAG GAG CTC GAC GCT GAA 444 
Thr Asp Pro Glu Asp Ala Leu Asp Ser Val Glu Glu Leu Asp Ala Glu 
110 115 120 125 

CCA GAC CTC GAC GTC GAG CCC GGC GAG GAC CTC GAC CTT GAC GCC GCC 492 
Pro Asp Leu Asp Val Glu Pro Gly Glu Asp Leu Asp Leu Asp Ala Ala 
130 135 140 

GAC CTC AAC CTC GAT GAC CTC GAG GAC GAC GTG GCG CCG GAC GCC GAC 540 
Asp Leu Asn Leu Asp Asp Leu Glu Asp Asp Val Ala Pro Asp Ala Asd 
145 ISO 155 * 

GAC GAC CTC GAC TCG GGC GAC GAC GAA GAC CAC GAA GAC CTC GAA GCT 588 
Asp Asp Leu Asp Ser Gly Asp Asp Glu Asp His Glu Asp Leu Glu Ala 
160 165 170 

GAG GCG GCC GTC GCG CCC GGC CAC ACC GCC GAT GAC GAC GAG GAG ATC 636 
Glu Ala Ala Val Ala Pro Gly Gin Thr Ala Asp Asp Asp Glu Glu He 
175 180 185 

GCT GAA CCC ACC GAA AAG GAC AAG GCC TCC GGT GAT TTC GTC TGG GAT 684 
Ala Glu Pro Thr Glu Lys Asp Lys Ala Ser Gly Asp Phe Val Trp Asp 
190 195 200 205 

GAA GAC GAG TCG GAG GCC CTG CGT CAA GCA CGC AAG GAC GCC GAA CTC 732 
Glu Asp Glu Ser Glu Ala Leu Arg Gin Ala Arg Lys Asp Ala Glu Leu 
210 215 220 

ACC GCA TCC GCC GAC TCG GTT CGC GCC TAC CTC AAA CAG ATC GGC AAG 780 
Thr Ala Ser Ala Asp Ser Val Arg Ala Tyr Leu Lys Gin He Gly Lys 
225 230 235 

GTA GCG CTG CTC AAC GCC GAG GAA GAG GTC GAG CTA GCC AAG CGG ATC 828 
Val Ala Leu Leu Asn Ala Glu Glu Glu Val Glu Leu Ala Lys Ara He 
240 245 250 

GAG GCT GGC CTG TAC GCC ACG CAG CTG ATG ACC GAG CTT AGC GAG CGC 876 
Glu Ala Gly Leu Tyr Ala Thr Gin Leu Met Thr Glu Leu Ser Glu Arg 
255 260 265 

GGC GAA AAG CTG CCT GCC GCC CAG CGC CGC GAC ATG ATG TGG ATC TGC 924 
5^y Glu Lys Leu Pro Ala Ala Gin Arg Arg Asp Met Met Trp He Cys 
270 275 280 285 

CGC GAC GGC GAT CGC GCG AAA AAC CAT CTG CTG GAA GCC AAC CTG CGC 972 
Arg Asp Gly Asp Arg Ala Lys Asn His Leu Leu Glu Ala Asn Leu Arg 
290 295 300 

CTG GTG GTT TCG CTA GCC AAG CGC TAC ACC GGC CGG GGC ATG GCG TTT 1020 
Leu Val Val Ser Leu Ala Lys Arg Tyr Thr Gly Arg Gly Met Ala Phe 
305 310 315 

CTC GAC CTG ATC CAG GAA GGC AAC CTG GGG CTG ATC CGC GCG GTG GAG 1068 
Leu Asp Leu He Gin Glu Gly Asn Leu Gly Leu He Arg Ala Val Glu 
320 325 330 
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AAG TTC GAC TAC ACC AAG GGG TAG AAG TTC TCC ACC TAC GOT ACG TGG 1116 
Lys Phe Asp Tyr Thr Lys Gly Tyr Lys Phe Ser Thr Tyr Ala Thr Trp 
335 340 345 

TGG ATT CGC CAG GCC ATC ACC CGC GCC ATG GCC GAC CAG GCC CGC ACC 1164 
Trp He Arg Gin Ala He Thr Arg Ala Met Ala Asp Gin Ala Arg Thr 
350 355 360 365 

ATC CGC ATC CCG GTG CAC ATG GTC GAG GTG ATC AAC AAG CTG GGC CGC 1212 
He Arg He Pro Val His Met Val Glu Val He Asn Lys Leu Gly Arg 
370 375 380 

ATT CAA CGC GAG CTG CTG CAG GAC CTG GGC CGC GAG CCC ACG CCC GAG 1260 
He Gin Arg Glu Leu Leu Gin Asp Leu Gly Arg Glu Pro Thr Pro Glu 
385 390 395 

GAG CTG GCC AAA CAG ATG GAC ATC ACC CCG GAG AAG GTG CTG GAA ATC 1308 
Glu Leu Ala Lys Glu Met Asp He Thr Pro Glu Lys Val Leu Glu lie 
400 405 410 

CAG CAA TAC GCC CGC GAG CCG ATC TCG TTG GAC CAG ACC ATC GGC GAC 1356 
Gin Gin Tyr Ala Arg Glu Pro He Ser Leu Asp Gin Thr He Gly Asp 
415 420 425 

GAG GGC GAC AGC CAG CTT GGC GAT TTC ATC GAA GAC AGC GAG GCG GTG 1404 
Glu Gly Asp Ser Gin Leu Gly Asp Phe He Glu Asp Ser Glu Ala Val 
430 435 440 445 

GTG GCC GTC GAC GCG GTG TCC TTC ACT TTG CTG CAG GAT CAA CTG CAG 1452 
Val Ala Val Asp Ala Val Ser Phe Thr Leu Leu Gin Asp Gin Leu Gin 
450 455 460 

TCG GTG CTG GAC ACG CTC TCC GAG CGT GAG GCG GGC GTG GTG CGG CTA 1500 
Ser Val Leu Asp Thr Leu Ser Glu Arg Glu Ala Gly Val Val Arg Leu 
465 470 475 

CGC TTC GGC CTT ACC GAC GGC CAG CCG CGC ACC CTT GAC GAG ATC GGC 1548 
Arg Phe Gly Leu Thr Asp Gly Gin Pro Arg Thr Leu Asp Glu He Gly 
480 485 490 

CAG GTC TAC GGC GTG ACC CGG GAA CGC ATC CGC CAG ATC GAA TCC AAG 1596 
Gin Val Tyr Gly Val Thr Arg Glu Arg He Arg Gin He Glu Ser Lys 
495 500 505 

ACT ATG TCG AAG TTG CGC CAT CCG AGC CGC TCA CAG GTC CTG CGC GAC 1644 
Thr Met Ser Lys Leu Arg His Pro Ser Arg Ser Gin Val Leu Arg Asp 
510 515 520 525 

TAC CTG GAC TGAGAGCGCC CGCCGAGGCG ACCAACGTAG CACGTGAGCC 1693 
Tyr Leu Asp 

CCCAGCAGCT AGCCGCACCA TGGTCTCGTC C 1724 

(2) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 528 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 

Val Ala Ala Thr Lys Ala Ser Thr Ala Thr Asp Glu Pro Val Lys Arg 
1 5 10 15 
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Thr Ala Thr Lys Ser Pro Ala Ala Ser Ala Ser Gly Ala Lys Thr Glv 
20 25 30 

Ala Lys Arg Thr Ala Ala Lys Ser Ala Ser Gly Ser Pro Pro Ala Lvs 
35 40 45 r 

Arg Ala Thr Lys Pro Ala Ala Arg Ser Val Lys Pro Ala Ser Ala Pro 
50 55 60 

Gin Asp Thr Thr Thr Ser Thr lie Pro Lys Arg Lys Thr Arg Ala Ala 
65 70 75 80 

Ala Lys Ser Ala Ala Ala Lys Ala Pro Ser Ala Arg Gly His Ala Thr 
85 90 95 

Lys Pro Arg Ala Pro Lys Asp Ala Gin His Glu Ala Ala Thr Asp Pro 
100 105 no 

Glu Asp Ala Leu Asp Ser Val Glu Glu Leu Asp Ala Glu Pro Asp Leu 
115 120 125 

Y?J Glu Pro Gly Glu Asp I-eu Asp ^ Asp Ala Ala Asp Leu Asn 
130 135 140 

Leu Asp Asp Leu Glu Asp Asp Val Ala Pro Asp Ala Asp Asp Asp Leu 
145 150 155 160 

Asp Ser Gly Asp Asp Glu Asp His Glu Asp Leu Glu Ala Glu Ala Ala 
165 170 175 

Val Ala Pro Gly Gin Thr Ala Asp Asp Asp Glu Glu lie Ala Glu Pro 
180 185 190 

Thr Glu Lys Asp Lys Ala Ser Gly Asp Phe Val Trp Asp Glu Asp Glu 
195 200 205 

Ser Glu Ala Leu Arg Gin Ala Arg Lys Asp Ala Glu Leu Thr Ala Ser 
210 215 220 

Ala Asp Ser Val Arg Ala Tyr Leu Lys Gin lie Gly Lys Val Ala Leu 
225 230 235 240 

Leu Asn Ala Glu Glu Glu Val Glu Leu Ala Lys Arg He Glu Ala Gly 
245 250 255 

Leu Tyr Ala Thr Gin Leu Met Thr Glu Leu Ser Glu Arg Gly Glu Lys 
260 265 270 

Leu Pro Ala Ala Gin Arg Arg Asp Met Met Trp He Cys Arg Asp Gly 
275 280 285 

Asp Arg Ala Lys Asn His Leu Leu Glu Ala Asn Leu Arg Leu Val Val 
290 295 300 

Ser Leu Ala Lys Arg Tyr Thr Gly Arg Gly Met Ala Phe Leu Asp Leu 
305 310 315 320 

He Gin Glu Gly Asn Leu Gly Leu He Arg Ala Val Glu Lys Phe Asp 
325 330 335 

Tyr Thr Lys Gly Tyr Lys Phe Ser Thr Tyr Ala Thr Trp Trp He Arg 
340 345 350 

Gin Ala He Thr Arg Ala Met Ala Asp Gin Ala Arg Thr He Arg He 
355 360 365 

Pro Val His Met Val Glu Val He Asn Lys Leu Gly Arg He Gin Arg 
370 375 380 
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Glu Leu Leu Gin Asp Leu Gly Arg Glu Pro Thr Pro Glu Glu Leu Ala 
385 390 395 400 

Lys Glu Met Asp lie Thr Pro Glu Lys Val Leu Glu lie Gin Gin Tyr 
405 410 415 

Ala Arg Glu Pro lie Ser Leu Asp Gin Thr lie Gly Asp Glu Gly Asp 
420 425 430 

Ser Gin Leu Gly Asp Phe lie Glu Asp Ser Glu Ala Val Val Ala Val 
435 440 445 

Asp Ala Val Ser Phe Thr Leu Leu Gin Asp Gin Leu Gin Ser Val Leu 
450 455 460 

Asp Thr Leu Ser Glu Arg Glu Ala Gly Val Val Arg Leu Arg Phe Gly 
465 470 475 480 

Leu Thr Asp Gly Gin Pro Arg Thr Leu Asp Glu lie Gly Gin Val Tyr 
485 490 495 

Gly Val Thr Arg Glu Arg lie Arg Gin lie Glu Ser Lys Thr Met Ser 
500 505 510 

Lys Leu Arg His Pro Ser Arg Ser Gin Val Leu Arg Asp Tyr Leu Asp 
515 520 525 



(2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1508 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : both 

(D) TOPOLOGY: linear 



(Vi) ORIGINAL SOURCE: 

(A) ORGANISM: Mycobacterium tuberculosis 
(C) INDIVIDUAL ISOLATE: atcc27294 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: pARC 8176 

(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 325. .1293 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 

ACCAGCCCGA CGACCGACGA ACCCCGCCGC TTCGACGTGC CCAGCCGGCG CATCCCGCTG 60 

TTCCCGACCG CGAACGGCCC GCACTCGAGC CGACGGCGAC AGCCGGCAAG AAGCGGTCAG 120 

CCCGCGGGGA TTCGCCGACC ACGGTTAGCC GTCTGTTGGC CGGCGTTCCG GGTTGTCGCC 180 

ACTGGCCACA CTTCTCAGGA CTTTCTCAGG TCTTCGGCAG ATTCCTGCAC GTCACAGGGC 240 

GTCAGATCAC TGCTGGGTGG GAACTCAAAG TCCGGCTTTG TCGTTAAACC CTGACAGTGC 300 

AAGCCGATCG GGGAACGGCT CGCT ATG GCC GAT GCA CCC ACA AGG GCC ACC 351 

Met Ala Asp Ala Pro Thr Arg Ala Thr 
530 535 
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ACA AGC CGG GTT GAC ACA GAT CTG GAT GCT CAA AGC CCC GCG GOG GAC 399 
Thr Ser Arg Val Asp Thr Asp Leu Asp Ala Gin Ser Pro Ala Ala Asp 
540 545 550 

CTC GTG CGC GTC TAT CTG AAC GGC ATC GGC AAG ACG GCG TTG CTC AAC 447 
Leu Val Arg Val Tyr Leu Asn Gly He Gly Lys Thr Ala Leu Leu Asn 
555 560 565 

GCG GCG GAT GAA GTC GAA CTG GCC AAG CGC ATA GAA GCC GGG TTG TAT 495 
Ala Ala Asp Glu Val Glu Leu Ala Lys Arg He Glu Ala Gly Leu Tyr 
570 575 580 585 

GCC GAG CAT CTG CTG GAA ACC CGG AAG CGC CTC GGC GAG AAC CGA AAA 543 
Ala Glu His Leu Leu Glu Thr Arg Lys Arg Leu Gly Glu Asn Arg Lvs 
590 595 600 

CGC GAC CTG GCG GCC GTG GTG CGT GAT GGC GAG GCC GCC CGC CGC CAC 591 
Arg Asp Leu Ala Ala Val Val Arg Asp Gly Glu Ala Ala Arg Arg His 
€05 6.T0 615 

CTG CTG GAA GCA AAC CTG CGG CTG GTG GTA TCG CTG GCC AAG CGC TAC 639 
Leu Leu Glu Ala Asn Leu Arg Leu Val Val Ser Leu Ala Lys Ara Tvr 
620 625 630 

ACG GGT CGG GGC ATG CCG TTG CTG GAC CTC ATC CAG GAG GGC AAC CTG 687 
Thr Gly Arg Gly Met Pro Leu Leu Asp Leu He Gin Glu Gly Asn Leu 
635 640 645 

GGT CTG ATC CGA GCG ATG GAG AAG TTC GAC TAC ACA AAG GGA TTC AAG 735 
Gly Leu He Arg Ala Met Glu Lys Phe Asp Tyr Thr Lys Gly Phe Lys 
650 655 660 665 

TTC TCA ACG TAT GCC ACG TGG TGG ATC CGC CAG GCC ATC ACC CGC GGA 783 
Phe Ser Thr Tyr Ala Thr Trp Trp He Arg Gin Ala He Thr Arg Gly 
670 675 680 

ATG GCC GAC CAG AGC CGC ACC ATC CGC CTG CCC GTA CAC CTG GTT GAG 831 
Met Ala Asp Gin Ser Arg Thr He Arg Leu Pro Val His Leu Val Glu 
685 690 695 

CAG GTC AAC AAG CTG GCG CGG ATC AAG CGG GAG ATG CAC CAG CAT CTG 879 
Gin Val Asn Lys Leu Ala Arg He Lys Arg Glu Met His Gin His Leu 
700 70S 710 

GGT CGC GAA CGC ACC GAT GAG GAG CTC GCC GCC GAA TCC GGC ATT CCA 927 
Gly Arg Glu Arg Thr Asp Glu Glu Leu Ala Ala Glu Ser Gly He Pro 
715 720 725 

ATC GAC AAG ATC AAC GAC CTG CTG GAA CAC ACT CGC GAC CCG GTG AGT 975 
He Asp Lys He Asn Asp Leu Leu Glu His Ser Arg Asp Pro Val Ser 
730 735 740 745 

CTG GAT ATG CCG GTC GGC TCC GAG GAG GAG GCC CCT TTG GGC GAT TTC 1023 
Leu Asp Met Pro Val Gly Ser Glu Glu Glu Ala Pro Leu Gly Asp Phe 
750 755 760 

ATC GAG GAC GCC GAA GCC ATG TCC GCG GAG AAC GCG GTC ATC GCC GAA 1071 
He Glu Asp Ala Glu Ala Met Ser Ala Glu Asn Ala Val He Ala Glu 
765 770 775 

CTG TTA CAC ACC GAC ATC CGC AGC GTG CTG GCC ACT CTC GAC GAG CGT 1119 
Leu Leu His Thr Asp He Arg Ser Val Leu Ala Thr Leu Asp Glu Ara 
780 785 790 

GAC GAC CAG GTG ATC CGG CTG CGC TTC GGC CTG GAT GAC GGC CAA CCA 1167 
Asp Asp Gin Val He Arg Leu Arg Phe Gly Leu Asp Asp Gly Gin Pro 
795 800 805 



Rwsnnr:if> <wn qkvutrai i > 



WO 96/38478 PCT/SE96/00319 

-24- 

CGC ACC - ?G GAT CAA ATC GGC AAA CTA TTC GGG CTG TCC CGT GAG CGC 1215 

Arg Thr Leu Asp Gin lie Gly Lys Leu Phe Gly Leu Ser Arg Glu Arg 
810 815 820 825 

GTT CGT CAG ATC GAG CGC GAC GTG ATG ACT AAG CTG CGG CAC GGT GAG 1263 
Val Arg Gin lie Glu Arg Asp Val Met Ser Lys Leu Arg His Gly Glu 
830 835 840 

CGG GCG GAT CGG CTG CGG TCG TAC GCC AGC TGAAGCTGGA CATC CTG AGC 1313 
Arg Ala Asp Arg Leu Arg Ser Tyr Ala Ser 
845 850 

CAGGTAGCAG ACGGTATGCC CGCCGCGCCA GCATAGCCTG CGGTGGGGCG GCGGGCAACC 1373 

ATTTTCGCAG CTGGCCAAGT GTAGACTCAG CTGCAATGGA GGGTGCTGAA TGAACGAGTT 1433 

GGTTG AT AC C ACCGAGATGT ACCTGCGGAC CATCTACGAC CTCGAGGAAG AGGGCGTGAC 1493 

GCACTGCGTG CCGGA 1508 

(2) INFORMATION FOR SEQ ID NO: 4: 

( i ) SEQUENCE. CHARACTERISTICS : 

(A) LENGTH: 323 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 

Met Ala Asp Ala Pro Thr Arg Ala Thr Thr Ser Arg Val Asp Thr Asp 
15 10 15 

Leu Asp Ala Gin Ser Pro Ala Ala Asp Leu Val Arg Val Tyr Leu Asn 
20 25 30 

Gly lie Gly Lys Thr Ala Leu Leu Asn Ala Ala Asp Glu Val Glu Leu 
35 40 45 

Ala Lys Arg lie Glu Ala Gly Leu Tyr Ala Glu His Leu Leu Glu Thr 
50 55 60 

Arg Lys Arg Leu Gly Glu Asn Arg Lys Arg Asp Leu Ala Ala Val Val 
65 70 75 80 

Arg Asp Gly Glu Ala Ala Arg Arg His Leu Leu Glu Ala Asn Leu Arg 
85 90 95 

Leu Val Val Ser Leu Ala Lys Arg Tyr Thr Gly Arg Gly Met Pro Leu 
100 105 110 

Leu Asp Leu lie Gin Glu Gly Asn Leu Gly Leu lie Arg Ala Met Glu 
115 120 125 

Lys Phe Asp Tyr Thr Lys Gly Phe Lys Phe Ser Thr Tyr Ala Thr Trp 
130 135 140 

Trp lie Arg Gin Ala lie . Thr Arg Gly Met Ala Asp Gin Ser Arg Thr 
145 150 155 160 

lie Arg Leu Pro Val His Leu Val Glu Gin Val Asn Lys Leu Ala Arg 
165 170 175 

lie Lys Axg Glu Met His Gin His Leu Gly Arg Glu Arg Thr Asp Glu 
180 185 190 
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Glu Leu Ala Ala Glu Ser Gly lie Pro lie Asp Lys He Asn Asp Leu 
195 200 205 

Leu Glu His Ser Arg Asp Pro Val Ser Leu Asp Met Pro Val Gly Ser 
210 215 220 

Glu Glu Glu Ala Pro Leu Gly Asp Phe He Glu Asp Ala Glu Ala Met 
225 230 235 240 

Ser Ala Glu Asn Ala Val He Ala Glu Leu Leu His Thr Asp He Arg 
245 250 255 

Ser Val Leu Ala Thr Leu Asp Glu Arg Asp Asp Gin Val He Arg Leu 
260 265 270 

Arg Phe Gly Leu Asp Asp Gly Gin Pro Arg Thr Leu Asp Gin He Gly 
275 280 285 

Lys Leu Phe Gly Leu Ser Arg Glu Arg Val Arg Gin He Glu Arg Asp 
290 295 300 

Val Met Ser Lys Leu Arg His Gly Glu Arg Ala Asp Arg Leu Arg Ser 
305 310 315 320 

Tyr Ala Ser 

(2) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc = -PCR primer" 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 
AAGTTCAGCA CSTACGCSAC STGGTGGATC 
(2) INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 27 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc = -PCR primer' 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 
CTTSGCCTCG ATCTGSCGGA TSCGCTC 
(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 25 base pairs 

(B) TYPE: nucleic acid 
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(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc « "PCR primer' 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 
TTCCATGGGG TATGTGGCAG CGACC 
(2) INFORMATION FOR SEQ ID NO: 8: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 27 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc = "PCR primer' 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 

GTACAGGCCA GCCTCGATCC GCTTGGC 

(2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 28 base pairs 
(8) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION : /desc = "PCR primer' 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 
TTTCATGGCC GATGCACCCA CAAGGGCC 
(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 27 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc = 'PCR primer' 
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(xi) SEQUENCE DESCRIPTION : SEQ ID NO: 10: 
CTTGAATTCA GCTGGCGTAC GACCGCA 
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INDICATIONS RELATING TO A DEPOSITED MICROORGANISM 

(PCTRuIe \3bis) 



A. The indications made below relate to the microorganism referred to in the description 
on page ]Z .line 23 



B. IDENTIFICATION OF DEPOSIT 


Further deposits are identified on an additional sbeet |~1 


Name of depositary institution 

The National Collections of Industrial and Marine Bacteria Limited (NCI MB) 


Address of depositary institution (ineludi*x postal code and country) 




23 St Machar Drive 
Aberdeen AB2 1RY 
Scotland, UK 




Date of deposit 

15 June 1994 


Accession Number 

NCIMB 40738 


C ADDITIONAL INDICATIONS (lea* blank if mat applicable) This information is continued on an additional sbeet |~] 



In respect of all designated states in which such action is possible and to the extent that it is 
legally permissible under the law of the designated state, it is requested that a sample of the 
deposited micro-organism be made available only by the issue thereof to an independent expert, 
in accordance with the relevant patent legislation, e.g. Rule 28(4) EPC, and generally similar 
provisions mutatis mutandis for any other designated state. 



D. DESIGNATED STATES FOR WHICH INDICATIONS ARE MADE Of the indications are met for all designated States) 



E. SEPARATE FURNISHING OF INDICATIONS (leave blank if not applicable) 



J^^^J^o^wted below will rjesubmilled lotbe International Bureau later (specify the general mature of the indications e*g^ "A 



PCI This sbeet was received with the international application 

1 2 -03- 1996 



For receiving Office use only 



Authorized officer 



For International Bureau use only 



I I This sbeet was received by (be International Bureau cm 



Authorized officer 



Ford PCT/RQ/134 (July 1992) 



WO 96/38478 PCI7SE96700319 
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INDICATIONS RELATING TO A DEPOSITED MICROORGANISM 

(PCT Rule 13«s) 



A* The indications made below relate to ibe microorganism referred to in the description 
on page ^ , line £4 



B* IDENTIFICATION OF DEPOSIT 


Further deposits are identified on an additional sheet 1 | 


Name of depositary institution 




The National Collections of Industrial and Marine Bacteria Limited (NCI MB) 


Address of depositary institution (including postal code and country) 




23 St Machar Drive 




Aberdeen AB2 1RY 




Scotland, UK 




Date of deposit 


Accession Number 


15 June 1994 


NCIMB 40739 



C ADDITIONAL INDICATIONS (leave blank if not applicable) This information is continued on an additional ineet Q 



In respect of all designated states in which such action is possible and to the extent that it is 
legally permissible under the law of the designated state, it is requested that a sample of the 
deposited micro-organism be made available only by the issue thereof to an independent expert, 
in accordance with the relevant patent legislation, e.g. Rule 28(4) EPC, and generally similar 
provisions mutatis mutandis for any other designated state. 



D. DESIGNATED STATES FOR WHICH INDICATIONS ARE MADE ft f the indications art not for ad designated States) 



E. SEPARATE FURNISHING OF INDICATIONS (leave blank if not appTtcabte) 



Use indications listed below will be submitted to tbe International Bureau later (speafythegeneralnatMraoflhein&cationsc&v 'Accession 
Number of Deposit-) 



For receiving Office use only 



jXl This sheet was received with the international application 

1 2 -03- 1996 



Axstborized officer 




For Internationa] Bureau use only 



I t This sheet was received by tbe Internationa] Bureau on: 



Authorized officer 



Form FCT/RCVI34 (July 1992) 



96/38478 

CLAIMS 



-29- 



PCT/SE96/0G319 



1. An isolated polypeptide which is a Group I sigma subunit of 
Mycobacterium tuberculosis RNA polymerase, or a functionally 
equivalent modified form thereof. 

2. A polypeptide according to claim 1 which amino acid sequence is 
identical to, or substantially similar to, SEQ ID NO: 2 or 4 in the 
Sequence Listing. 

3. An isolated nucleic acid molecule which has a nucleotide sequence 
coding for a polypeptide according to claim 1 or 2. 

4. An isolated nucleic acid molecule selected from: 

(a) DNA molecules comprising a nucleotide sequence as shown in 
SEQ ID NO: 1 or SEQ ID NO: 3 encoding a Group I sigma subunit 
of Mycobacterium tuberculosis RNA polymerase; 

(b) nucleic acid molecules comprising a nucleotide sequence capable 
of hybridizing to a nucleotide sequence complementary the 
polypeptide coding region of a DNA molecule as defined in (a) and 
which codes for a polypeptide which is a Group I sigma subunit of 
Mycobacterium tuberculosis or a functionally equivalent modified form 
thereof; and 

(c) nucleic acid molecules comprising a nucleic acid sequence which 
is degenerate, as a result of the genetic code, to a nucleotide 
sequence as defined in (a) or (b) and which codes for a polypeptide 
which is a Group I sigma subunit of Mycobacterium tuberculosis or a 
functionally equivalent modified form thereof. 

5. A vector which comprises a nucleic acid molecule according to claim 
3 or 4. 



WO 96/39478 PCT/3E96/00319 

-30- 

6. A vector according to claim 5 which is the plasmid vector pARC 
8175 (NC3MB 40738) or pARC 8176 (NCUMB 40739). 

7. A vector according to claim 5 which is an expression vector capable 
5 of mediating the expression of a polypeptide according to claim 1 or 

2. 

8. A host cell harbouring a vector according to any one of claims 5 to 7. 

10 9. A process for production of a polypeptide according to claim 1 or 2 
which comprises culturing a host cell according to claim 8 
transformed with an expression vector according to claim 7 under 
conditions whereby said polypeptide is produced and recovering 
said polypeptide. 

15 

10. A method of assaying for compounds which have the ability to 
inhibit the association of a sigma subunit with a Mycobacterium 
_ tuberculosis core RNA polymerase, said method comprising (i) 
contacting a compound to be tested for said inhibition ability h a 
20 polypeptide according to claim 1 or claim 2 and a Mycobacterium 

tuberculosis core RNA polymerase; and (ii) detecting whether the said 
polypeptide associates with the said core RNA polymerase to form 
RNA polymerase holoenzyme. 

25 11. A method according to claim 10 wherein polypeptides which are 

associated to core RNA polymerase and / or polypeptides which are 
not associated to core RNA polymerase are detected by 
chromatography such as gel filtration. 

30 12. A method according to claim 10 wherein RNA polymerase 

holoenzyme is detected by immunoprecipitation, using an antibody 
binding to RNA polymerase holoenzyme. 



BNSDOC1D: <WO 963847BA1 I > 
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13. A method of assaying for compounds which have the ability to 
inhibit sigma subunit-dependent transcription by a Mycobacterium 
tuberculosis RNA polymerase, said method comprising (i) contacting 
a compound to be tested for said inhibition ability with a 
polypeptide according to claim 1 or claim 2, a Mycobacterium 
tuberculosis core RNA polymerase, and a DNA having a coding 
sequence operably-linked to a promoter sequence capable of 
recognition by said core RNA polymerase when bound to said 
polypeptide, said contacting being carried out under conditions 
suitable for transcription of said coding sequence when 
Mycobacterium tuberculosis RNA polymerase is bound to said 
promoter; and (ii) detecting formation of mRNA corresponding to 
said coding sequence. 



A method of determining the protein structure of a Mycobacterium 
tuberculosis RNA polymerase sigma subunit, characterised in that a 
polypeptide according to claim 1 or claim 2 is utilized in X-ray 
crystallography. 



BNSOOCID: < WO 9638478A 1_l_> 
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